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Abstract — We investigate the recovery of signals exliibiting 
a sparse representation in a general (i.e., possibly redundant 
or incomplete) dictionary that are corrupted by additive noise 
admitting a sparse representation in another general dictionary. 
This setup covers a wide range of applications, such as image 
inpainting, super-resolution, signal separation, and recovery of 
signals that are impaired by, e.g., clipping, impulse noise, or 
narrowband interference. We present deterministic recovery 
guarantees based on a novel uncertainty relation for pairs of 
general dictionaries and we provide corresponding practicable 
recovery algorithms. The recovery guarantees we find depend 
on the signal and noise sparsity levels, on the coherence param- 
eters of the involved dictionaries, and on the amount of prior 
knowledge about the signal and noise support sets. 

Index Tenns — Uncertainty relations, signal restoration, signal 
separation, coherence-based recovery guarantees, £i-norm mini- 
mization, greedy algorithms. 



I. Introduction 
We consider the problem of identifying the sparse vec- 



tor X S 



collected in the vector 



from M linear and non-adaptive measurements 



z = Ax + Be 



(1) 



where A e C*^^^" and B e C*^^^*" are known deterministic 
and general (i.e., not necessarily of the same cardinality, and 
possibly redundant or incomplete) dictionaries, and e S C^' 
represents a sparse noise vector The support set of e and the 
corresponding nonzero entries can be arbitrary; in particular, 
e may also depend on x and/or the dictionary A. 

This recovery problem occurs in many applications, some 
of which are described next: 

• Clipping: Non-linearities in (power-)amplifiers or in 
analog-to-digital converters often cause signal clipping 
or saturation |2). This impairment can be cast into the 



signal model ( 



by setting B 



where denotes 



the M X M identity matrix, and rewriting ([T]i as z = y+e 
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with e = 5a (y) — y- Concretely, instead of the M- 
dimensional signal vector y — Ax of interest, the device 
in question delivers ga{y), where the function ga{y) re- 
alizes entry-wise signal clipping to the interval [—a, +a]. 
The vector e will be sparse, provided the clipping level is 
high enough. Furthermore, in this case the support set of e 
can be identified prior to recovery, by simply comparing 
the absolute values of the entries of y to the clipping 
threshold a. Finally, we note that here it is essential that 
the noise vector e be allowed to depend on the vector x 
and/or the dictionary A. 

Impulse noise: In numerous applications, one has to 
deal with the recovery of signals corrupted by impulse 
noise ^SJ- Specific applications include, e.g., reading out 
from unreliable memory Q or recovery of audio signals 
impaired by click/pop noise, which typically occurs dur- 
ing playback of old phonograph records. The model in ([T]| 
is easily seen to incorporate such impairments. Just set 
B = lif and let e be the impulse-noise vector We would 
like to emphasize the generality of ([T]| which allows 
impulse noise that is sparse in general dictionaries B. 
Narrowband interference: In many applications one is 
interested in recovering audio, video, or communication 
signals that are corrupted by narrowband interference. 
Electric hum, as it may occur in improperly designed 
audio or video equipment, is a typical example of such 
an impairment. Electric hum typically exhibits a sparse 
representation in the Fourier basis as it (mainly) con- 
sists of a tone at some base-frequency and a series of 
corresponding harmonics, which is captured by setting 
B — Ym in ([TJ> where is the ill-dimensional discrete 
Fourier transform (DFT) matrix defined below in ([2|. 
Super-resolution and inpainting: Our framework also 
encompasses super-resolution f6] and inpainting f?! 
for images, audio, and video signals. In both applications, 
only a subset of the entries of the (full-resolution) signal 
vector y ~ Ax is available and the task is to fill in the 
missing entries of the signal vector such that y — Ax. 
The missing entries are accounted for by choosing the 
vector e such that the entries of z = y + e corresponding 
to the missing entries in y are set to some (arbitrary) 
value, e.g., 0. The missing entries of y are then filled in 
by first recovering x from z and then computing y = Ax. 
Note that in both applications the support set £ is known 
(i.e., the locations of the missing entries can easily be 
identified) and the dictionary A is typically redundant 
(see, e.g., |8J for a corresponding discussion), i.e., A 
has more dictionary elements (columns) than rows, which 
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demonstrates the need for recovery results that apply to 
general (i.e., possibly redundant) dictionaries. 
• Signal separation: Separation of (audio or video) signals 
into two distinct components also fits into our framework. 
A prominent example for this task is the separation of 
texture from cartoon parts in images (see |9J, [10 J and 
references therein). In the language of our setup, the 
dictionaries A and B are chosen such that they allow 
for sparse representation of the two distinct features; 
X and e are the corresponding coefficients describing 
these features (sparsely). Note that here the vector e 
no longer plays the role of (undesired) noise. Signal 
separation then amounts to simultaneously extracting the 
sparse vectors x and e from the observation (e.g., the 
image) z = Ax + Be. 
Naturally, it is of significant practical interest to identify 
fundamental limits on the recovery of x (and e, if appropriate) 
from z in ([T]). For the noiseless case z — Ax such recovery 
guarantees are known pT^-p3) and typically set limits on the 
maximum allowed number of nonzero entries of x or — more 
colloquially — on the "sparsity" level of x. These recovery 
guarantees are usually expressed in terms of restricted isom- 
etry constants (RICs) flTl, (15] or in terms of the coherence 
parameter |11|-|13], [16] of the dictionary A. In contrast to 
coherence parameters, RICs can, in general, not be computed 
efficiently. In this paper, we focus exclusively on coherence- 
based recovery guarantees. For the case of unstructured noise, 
i.e., z = Ax + n with no constraints imposed on n apart 
from ||n||2 < oo, coherence-based recovery guarantees were 



derived in p6)-p0). The corresponding results, however, do 
not guarantee perfect recovery of x, but only ensure that either 
the recovery error is bounded above by a function of |ln||2 or 
only guarantee perfect recovery of the support set of x. Such 
results are to be expected, as a consequence of the generality 
of the setup in terms of the assumptions on the noise vector n. 

A. Contributions 

In this paper, we consider the following questions: 1) Under 
which conditions can the vector x (and the vector e, if 
appropriate) be recovered perfectly from the (sparsely cor- 
rupted) observation z — Ax + Be, and 2) can we formulate 
practical recovery algorithms with corresponding (analytical) 
performance guarantees? Sparsity of the signal vector x and 
the error vector e will turn out to be key in answering 
these questions. More specifically, based on an uncertainty 
relation for pairs of general dictionaries, we establish recovery 
guarantees that depend on the number of nonzero entries in x 
and e, and on the coherence parameters of the dictionaries 
A and B. These recovery guarantees are obtained for the 
following different cases: I) The support sets of both x and 
e are known (prior to recovery), II) the support set of only 
X or only e is known. III) the number of nonzero entries of 
only X or only e is known, and IV) nothing is known about x 
and e. We formulate efficient recovery algorithms and derive 
corresponding performance guarantees. Finally, we compare 
our analytical recovery thresholds to numerical results and we 
demonstrate the application of our algorithms and recovery 
guarantees to an image inpainting example. 



B. Outline of the paper 

The remainder of the paper is organized as follows. In Sec- 
tion ini we briefly review relevant previous results. In Sec- 
tion nn] we derive a novel uncertainty relation that lays the 
foundation for the recovery guarantees reported in Section |IV] 
A discussion of our results is provided in Section |V] and 
numerical results are presented in Section VI We conclude 
in Section IVIII 



C. Notation 

Lowercase boldface letters stand for column vectors and 
uppercase boldface letters designate matrices. For the matrix 
M, we denote its transpose and conjugate transpose by 
and M^, respectively, its (Moore-Penrose) pseudo-inverse by 
(M-^M)"^ M-^, its /cth column by mfc, and the entry 
in the fcth row and Ah column by [MJ^^. The fcth entry of the 
vector m is [mjj.. The space spanned by the columns of M is 
denoted by 7?.(M). The AI x M identity matrix is denoted by 
1m, the M X iV all zeros matrix by (\m,n, and the all-zeros 
vector of dimension M by Ojv/- The M x M discrete Fourier 
transform matrix Y m is defined as 



1 



exp 



2TTi{k~ !)(£- 1) 
M 



, fc,£=l,...,M 
(2) 



where i"^ — ~\. The Euclidean (or I2) norm of the vector x is 
denoted by ||x||2, ||x||.^ stands for the ^i-norm of x, and ||x||q 
designates the number of nonzero entries in x. Throughout the 
paper, we assume that the columns of the dictionaries A and 
B have unit ^2-norm. The minimum and maximum eigenvalue 
of the positive-semidefinite matrix M is denoted by Amin(M) 
and Amax(M), respectively. The spectral norm of the matrix 
M is ||M|| = ^Amax(M-f^M). Sets are designated by upper- 
case calligraphic letters; the cardinality of the set T is |T|. 
The complement of a set S (in some superset T) is denoted 
by 5^. For two sets Si and ^2, s e [Si + S2) means that s 
is of the form s = si + S2, where si e Si and S2 G The 
support set of the vector m is designated by supp(m). The 
matrix M7- is obtained from M by retaining the columns of 
M with indices in T; the vector 1x17- is obtained analogously. 
We define the N x N diagonal (projection) matrix P5 for the 
set iS C {1, . . . , N} as follows: 



s\k,e 



1, k = £eind keS 
0, otherwise. 



For a; e M, we set [a;]^ = maxjx, 0}. 



II. Review of Relevant Previous Results 

Recovery of the vector x from the sparsely corrupted mea- 
surement z = Ax + Be corresponds to a sparse-signal re- 
covery problem subject to structured (i.e., sparse) noise. In 
this section, we briefly review relevant existing results for 
sparse-signal recovery from noiseless measurements, and we 
summarize the results available for recovery in the presence 
of unstructured and structured noise. 
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A. Recovery in the noiseless case 

Recovery of x from z — Ax where A is redundant (i.e., 
Al < No) amounts to solving an underdetermined linear sys- 
tem of equations. Hence, there are infinitely many solutions x, 
in general. However, under the assumption of x being sparse, 
the situation changes drastically. More specifically, one can 
recover x from the observation z = Ax by solving 

(PO) minimize ||x||q subject to z = Ax. 

This approach results, however, in prohibitive computational 
complexity, even for small problem sizes. Two of the most 
popular and computationally tractable alternatives to solving 
(PO) by an exhaustive search are basis pursuit (BP) 
1 21 -p3| and orthogonal matching pursuit (OMP) 
1 25 . BP is essentially a convex relaxation of (PO) and amounts 




to solving 

(BP) minimize II x| I subject to z = Ax. 

OMP is a greedy algorithm that recovers the vector x by 
iteratively selecting the column of A that is most "correlated" 
with the difference between z and its current best (in ^2-norm 
sense) approximation. 

The questions that arise naturally are: Under which con- 
ditions does (PO) have a unique solution and when do BP 
and/or OMP deliver this solution? To formulate the answer to 
these questions, define = ||x||q and the coherence of the 
dictionary A as 



Ma 



max 



(3) 



As shown in |[TT|-|[T3l, a sufficient condition for x to be the 
unique solution of (PO) applied to z = Ax and for BP and 
OMP to deliver this solution is 



1 



Ma 



(4) 



B. Recovery in the presence of unstructured noise 

Coherence-based recovery guarantees in the presence of un- 
structured (and deterministic) noise, i.e., for z ~ Ax+n, with 
no constraints imposed on n apart from ||n||2 < oo, were 
derived in fTSl-fSOl and the references therein. Specifically, 
it was shown in |16| that a suitably modified version of BP, 
referred to as BP denoising (BPDN), recovers an estimate x 
satisfying ||x — xHj < C'||n||2 provided that (j4]) is met. Here, 
C > depends on the coherence Ha and on the sparsity level 
Tlx of X. Note that the support set of the estimate x may differ 
from that of x. Another result, reported in f\n\, states that 
OMP delivers the correct support set (but does not perfectly 
recover the nonzero entries of x) provided that 

1 



Ma 



n 



Ma I ■^min I 

where |a;n,in| denotes the absolute value of the component of x 
with smallest nonzero magnitude. The recovery condition (jSj) 
yields sensible results only if ||n||2/|a;min| is small. Results 
similar to those reported in |17[ were obtained in |18) , |19) . 
Recovery guarantees in the case of stochastic noise n can be 



of X is, in general, impossible in the presence of unstructured 
noise. In contrast, as we shall see below, perfect recovery is 
possible under structured noise according to ([T]i. 

C. Recovery guarantees in the presence of structured noise 

As outlined in the introduction, many practically relevant 
signal recovery problems can be formulated as (sparse) signal 
recovery from sparsely corrupted measurements, a problem 
that seems to have received comparatively little attention in the 
literature so far and does not appear to have been developed 
systematically. 

A straightforward way leading to recovery guarantees in the 
presence of structured noise, as in (|T||, follows from rewrit- 
ing ([T]) as 

z = Ax + Be = Dw (6) 



with the concatenated dictionary D = [A B] and the stacked 
vector w = [x-^ e^]-^. This formulation allows us to invoke 
the recovery guarantee in Q for the concatenated dictionary 
D, which delivers a sufficient condition for w (and hence, 
X and e) to be the unique solution of (PO) applied to z = 
Dw and for BP and OMP to deliver this solution i fTTI , 
However, the so obtained recovery condition 

nyj = nx + lie < ^ (l + /"d ^) 
with the dictionary coherence /i^ defined as 



(7) 



Md 



max |d?d£ 



(8) 



ignores the structure of the recovery problem at hand, i.e., is 
agnostic to i) the fact that D consists of the dictionaries A and 
B with known coherence parameters jia and /i;,, respectively, 
and ii) knowledge about the support sets of x and/or e that 
may be available prior to recovery. As shown in Section IV 
exploiting these two structural aspects of the recovery problem 
yields superior (i.e., less restrictive) recovery thresholds. Note 
that condition (|7|i guarantees perfect recovery of x (and e) in- 
dependent of the ^2-iiorm of the noise vector, i.e., ||Be||2 may 
be arbitrarily large. This is in stark contrast to the recovery 
guarantees for noisy measurements in | |16J and (|5]l (originally 
reported in p7)). 

Special cases of the general setup ([T]l, explicitly taking into 
account certain structural aspects of the recovery problem were 
considered in ||3), |[T4), ||26|-|[30|. Specifically, in |26| it was 
shown that for A ~ Fj\/, B = Ijv/, and knowledge of the 
support set of e, perfect recovery of the A/ -dimensional vector 
X is possible if 



2nxne < M 



(9) 



(5) where rig = ||e| 



In 1 27 1, ||28|, recovery guarantees based 
A for the case where B is an 



found in p9), |20|. We finally point out that perfect recovery 



on the RIC of the matrix 
orthonormal basis (ONB), and where the support set of e 
is either known or unknown, were reported; these recovery 
guarantees are particularly handy when A is, for example, 
i.i.d. Gaussian pij , |32j. However, results for the case of 
A and B both general (and deterministic) dictionaries taking 
into account prior knowledge about the support sets of x and 
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1^1 |C| > ^ ~ ~ ^ ^"^^^ ~ ~ ^"^^ . (10) 

Mm 



e seem to be missing in the literature. Recovery guarantees 
for A i.i.d. non-zero mean Gaussian, B = Im, and the sup- 
port sets of X and e unknown were reported in [29]. In f30[ 
recovery guarantees under a probabilistic model on both x 
and e and for unitary A and B = Im were reported showing 
that X can be recovered perfectly with high probability (and 
independently of the ^2-norm of x and e). The problem of 
sparse-signal recovery in the presence of impulse noise (i.e., 
B = Im) was considered in |3|, where a particular nonlinear 
measurement process combined with a non-convex program 
for signal recovery was proposed. In | [T4) , signal recovery in 
the presence of impulse noise based on £i-norm minimization 
was investigated. The setup in |14|, however, differs consider- 
ably from the one considered in this paper as A in p4| needs 
to be tall (i.e., M > Na) and the vector x to be recovered is 
not necessarily sparse. 

We conclude this literature overview by noting that the 
present paper is inspired by [26 1. Specifically, we note that 
the recovery guarantee (|9|l reported in |26J is obtained from 
an uncertainty relation that puts limits on how sparse a given 
signal can simultaneously be in the Fourier basis and in the 
identity basis. Inspired by this observation, we start our discus- 
sion by presenting an uncertainty relation for pairs of general 
dictionaries, which forms the basis for the recovery guarantees 
reported later in this paper. 

III. A General Uncertainty Relation for 

e-CONCENTRATED VECTORS 

We next present a novel uncertainty relation, which ex- 
tends the uncertainty relation in [[33| Lem. 1] for pairs of 
general dictionaries to vectors that are e-concentrated rather 



than perfectly sparse. As shown in Section IV this extension 



constitutes the basis for the derivation of recovery guarantees 
for BR 



A. The uncertainty relation 

Define the mutual coherence between the dictionaries A and 
B as 



Mr 



max 

k.l ' 



at Of 



be a vector in C^^ that can be represented as a linear combina- 
tion of columns of A and, similarly, as a linear combination 
of columns of B. Concretely, there exists a pair of vectors 
p € C^" and q g C^' such that s = Ap = Bq (we exclude 
the trivial case where p = Oat^ and q ~ OjvjQif p is e-p- 



Furthermore, we will need the following definition, which 
appeared previously in p6| . 

Definition 1: A vector r e C^'' is said to be e-ji-concentrated 
to the set 7^ C {!,. . . ,iV^} if ||P7?,r||;^ > (l-eK)||r||^, where 
£7?, € [0, 1]. We say that the vector r is perfectly concentrated 
to the set TZ and, hence, |7?.|-sparse if P^r — r, i.e., if e-ji — 0. 

We can now state the following uncertainty relation for pairs 
of general dictionaries and for e-concentrated vectors. 

Theorem 1: Let A e C*^^^" be a dictionary with coher- 
ence ^Q, B e (^i^ixNt ^ dictionary with coherence fib, and 
denote the mutual coherence between A and B by Let s 



concentrated to V and q is eg -concentrated to Q, then (10 1 
holds. 

Proof: The proof follows closely that of |33 Lem. 1], 



which applies to perfectly concentrated vectors p and q. We 
therefore only summarize the modifications to the proof of 
l |33| L em. 1]. Instead of using I]pe-pl[p]pl = I 
aFim] Eq. 29] 



to arrive 



[il + ^la)~\r\^la]'^\\p\\, < \v\^l„^\\q\\, 

we invoke X]pepl[p]pl ^ (1 ~ £p)l|p|li to arrive at the fol- 
lowing inequality valid for ep -concentrated vectors p: 

[(i + Aia)(i-ep)- IT'ImJ+IIpIIi < \v\^i„^H\,. 



Similarly, eg -concentration, i.e., J2q 
is used to replace p3| Eq. 30] by 

[(l + Aib)(l-eg)-|Q|H + 



q|li < |Q|m™IIpI 



(11) 

Iqlli, 

(12) 



The uncertainty relation ( 10 1 is then obtained by multiply- 
ing ( [TT] i and ([12]) and dividing the resulting inequality by 

l|p|lil|q|lr ■ 

In the case where both p and q are perfectly concentrated, 
i.e., ep = eg = 0, Theorem [T] reduces to the uncertainty 
relation reported in |33 Lem. 1], which we restate next for 
the sake of completeness. 

Corollary 2 ( l{33\ Lem. 1]): If V — supp(p) and Q = 
supp(q), the following holds: 



I^IIQ|> 



Mm 



(13) 



As detailed in p3| , p4], the uncertainty relation in Corollary |2] 
generalizes the uncertainty relation for two orthonormal bases 
(ONBs) found in |23|. Furthermore, it extends the uncertainty 
relations provided in [35 1 for pairs of square dictionaries 
(having the same number of rows and columns) to pairs of 
general dictionaries A and B. 

B. Tightness of the uncertainty relation 

In certain special cases it is possible to find signals that 



satisfy the uncertainty relation (10 1 with equality. As in |[26] 
consider A = Fm 



and B = Im, so that ii„i — 1/v-A/, and 
define the comb signal containing equidistant spikes of unit 
height as 



[St 



1, if {£ - 1) mod t = 
0, otherwise 



'The uncertainty relation continues to liold if eitlier p = Ojv„ or q = Ojv^, 
but does not apply to the trivial case p = Ojv„ and q = OjVj,- In all three 
cases we have s = Oj\/. 
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where we shall assume that t divides M. It can be shown 
that the vectors p = 5 ^rjj and c{ — 5 ^jjj, both having \/M 
nonzero entries, satisfy F^/P = ImQ- If 7^ = supp(p) and 
Q — supp(q), the vectors p and q are perfectly concentrated to 
V and Q, respectively, i.e., €-p — eq — 0. Since |7^| = |Q| = 
VM and = 1/VM it follows that \V\ \ Q\ = l/nl, = M 
and, hence, p = q = S./n satisfies 



/M 



10 1 with equality. 



We will next show that for pairs of general dictionaries 
A and B, finding signals that satisfy the uncertainty relation 
( [TO) l with equality is NP-hard. For the sake of simplicity, we 
restrict ourselves to the case V = supp(p) and Q = supp(q). 



which implies 
the problem 



\r\ = IIpIIo and \Q\ = HqUg. Next, consider 



(UO) 



mmimize 



subject to Ap = Bq, ||p||o > 1, \\q\\„ > 1. 



Since we are interested in the minimum of ||p||q ||q||o for 
nonzero vectors p and q, we imposed the constraints ||p||q > 1 
and llqllg > 1 to exclude the case where p = 0^^ and/or 
q = Oni^. Now, it follows that for the particular choice B — 
z e C*^ and hence q~qEC\ {0} (note that we exclude the 
case g = as a consequence of the requirement ||q||Q > 1) 
the problem (UO) reduces to 



(UO* 



mmimize x 



ip subject to Ax = z 

where x — p/q. However, as (UO*) is equivalent to (PO), 
which is NP-hard |36|, in general, we can conclude that finding 
a pair p and q satisfying the uncertainty relation ( fTO] ) with 
equality is NP-hard. 

IV. Recovery of Sparsely Corrupted Signals 

Based on the uncertainty relation in Theorem [T] we next 
derive conditions that guarantee perfect recovery of x (and of 
e, if appropriate) from the (sparsely corrupted) measurement 
z = Ax + Be. These conditions will be seen to depend 
on the number of nonzero entries of x and e, and on the 
coherence parameters fia, Hb, and ii„i- Moreover, in contrast 
to (|5]), the recovery conditions we find will not depend on the 
£2-norm of the noise vector ||Be||2, which is hence allowed 
to be arbitrarily large. We consider the following cases: I) The 
support sets of both x and e are known (prior to recovery), 
11) the support set of only x or only e is known. III) the 
number of nonzero entries of only x or only e is known, and 
rV) nothing is known about x and e. The uncertainty relation 
in Theorem[T]is the basis for the recovery guarantees in all four 
cases considered. To simplify notation, motivated by the form 
of the right-hand side (RHS) of ( [T3] l, we define the function 



fiu,v) 



In the remainder of the paper, X denotes supp(x) and £ stands 
for supp(e). We furthermore assume that the dictionaries A 
and B are known perfectly to the recovery algorithms. More- 
over, we assume thaj^/i„i > 0. 

-If f^m — 0, the space spanned by the columns of A is orthogonal to 
the space spanned by the columns of B. This makes the separation of the 
components Ax and Be given z straightforward. Once this separation is 
accomplished, x can be recovered from Ax using (PO), BP or OMR if (5) 
is satisfied. 



A. Case I: Knowledge of X and £ 

We start with the case where both X and £ are known prior 
to recovery. The values of the nonzero entries of x and e are 
unknown. This scenario is relevant, for example, in applica- 
tions requiring recovery of clipped band-limited signals with 
known spectral support X. Here, we would have A = Fm, 
B = 1m, and £ can be determined as follows: Compare the 
measurements [z]i, i = 1, . . . , M, to the clipping threshold a; 
if |[z]i| = a add the corresponding index ilo £. 

Recovery of x from z is then performed as follows. We first 
rewrite the input-output relation in ([T]l as 

z = Axy^x + BfGf = T>xxSx,E 
with the concatenated dictionary T)x,£ — [-^x ] and the 
stacked vector sx^^ [x^ ej] . Since X and £ are known, 

we can recover the stacked vector sx,£ = [x^ Per- 
fectly and, hence, the nonzero entries of both x and e, if the 
pseudo-inverse D]^ ^ exists. In this case, we can obtain sx,£, 
as 



sx.£ = D, 



X,£' 



(14) 



The following theorem states a sufficient condition for T)x.£ 
to have full (column) rank, which implies existence of the 
pseudo-inverse D^^. This condition depends on the coher- 
ence parameters ^a, Mh, and /i^, of the involved dictionaries 
A and B and on X and £ through the cardinalities \X\ and 
\£\, i.e., the number of nonzero entries in x and e, respectively. 



Theorem 3: Let z = Ax 
£ = supp(e). Define ~ ||: 

rixrie < fin-xjUe) 



Be with X = supp(x) and 
and Up = llelL. If 



(15) 



then the concatenated dictionary T)x,£ — [-^x Bf ] has full 
(column) rank. 

Proof See Appendix [A] ■ 
For the special case A = Fm and B = Im (so that = 
l^b = Q and /i„j = 1/ \/M) the recovery condition ( 15 1 reduces 
to TixUe < M, a result obtained previously in |26|. Tightness 
of ( 15 I can be established by noting that the pairs x = ^rjj, 
e =^1 - with A e (0, 1) and x' = X'S^, e' = 

(1 - y)S^ with A' 7^ A and A' e (0, 1) both satisfy ([TSi 



with equality and lead to the same measurement outcome z = 
Fmx + e = Fmx' + e' 1^. 

It is interesting to observe that Theorem |3] yields a sufficient 
condition on rix and rie for any {M — rie) x nj.-submatrix of 
A to have full (column) rank. To see this, consider the special 
case B = 1^/ and hence, T)x,£ = [-^x If]- Condition (15 1 
characterizes pairs (ux, rie), for which all matrices T)x,£ with 
Ux — \X\ and — \£\ are guaranteed to have full (column) 
rank. Hence, the sub-matrix consisting of all rows of Ax with 
row index in £^ must have full (column) rank as well. Since 
the result holds for all support sets X and £ with \X\ = Ux 
and \£\ — n^, all possible {M — jip) x n^j-submatrices of A 
must have full (column) rank. 

B. Case II: Only X or only £ is known 

Next, we find recovery guarantees for the case where either 
only X or only £ is known prior to recovery. 
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1 ) Recovery when E is known and X is unknown: A promi- 
nent application for this setup is the recovery of clipped band- 
limited signals |27J, [37J , where the signal's spectral support, 
i.e., X, is unknown. The support set £ can be identified as 



detailed previously in Section IV-A Further application exam- 
ples for this setup include inpainting and super-resolution [5|- 
||7) of signals that admit a sparse representation in A (but 
with unknown support set X). The locations of the miss- 
ing elements in y = Ax are known (and correspond, e.g., 
to missing paint elements in frescos), i.e., the set £ can be 
determined prior to recovery. Inpainting and super-resolution 
then amount to reconstructing the vector x from the sparsely 
corrupted measurement z = Ax + e and computing y — Ax. 

The setting of £ known and X unknown was considered 
previously in [26| for the special case A — Fm and B — Im- 
The recovery condition ( fTS] ! in Theorem |4] below extends the 
result in p6{ Thms. 5 and 9] to pairs of general dictionaries 
A and B. 

Theorem 4: Let z = Ax + Be where £ = supp(e) is 
known. Consider the problem 



(P0,£) 



rmmrmze 



subject to Ax € ({z} + TZ{B£)) 
and the convex program 



(BP,£:) 



minimize 



subject to Ax e ({z} + TZiBg)) . 



If Tlx = ||x||g and Tie = ||e||g satisfy 

2n^ne < /(2ri^,ne), 



(16) 



(17) 



(18) 



Be 



then the unique solution of (PO, applied to z = Ax 
is given by x and (BP, 5) will deliver this solution. 

Proof: See Appendix [B] ■ 
Solving (PO, £) requires a combinatorial search, which results 
in prohibitive computational complexity even for moderate 
problem sizes. The convex relaxation {BP,£) can, however, 
be solved more efficiently. Note that the constraint Ax e 
{{z} + TZCBg)) reflects the fact that any error component 
B^e^ yields consistency on account of £ known (by as- 
sumption). For Tie — (i.e., the noiseless case) the recovery 
threshold ([TSj reduces to n^; < (1 + l//iQ)/2, which is the 
well-known recovery threshold Q guaranteeing recovery of 
the sparse vector x through (PO) and BP applied to z = Ax. 
We finally note that RIC-based guarantees for recovering x 
from z = Ax (i.e., recovery in the absence of (sparse) 
corruptions) that take into account partial knowledge of the 
signal support set X were developed in | j38) , p9|. 

Tightness of (18 1 can be established by setting A = Fm 



2VM 



- s 



and B = Im- Specifically, the pairs x = J, 
e = d yjy and x' = ^2\/S7' g' = ^ both satisfy (18i with 



A/' 



equality. One can furthermore verify that x and x' are~both in 
the admissible set specified by the constraints in {PQ,£) and 
iBP,£) and ||x'||o = ||x||o, ||x'||i = ||x||i. Hence, (PO,f) 
and (BP, £) both cannot distinguish between x and x' based 
on the measurement outcome z. For a detailed discussion of 
this example we refer to |34[. 



Rather than solving (P0,£) or (BP,£), we may attempt to 
recover the vector x by exploiting more directly the fact that 
TZCBg) is known (since B and £ are assumed to be known) 
and projecting the measurement outcome z onto the orthog- 
onal complement of 71(3 g). This approach would eliminate 
the (sparse) noise component and leave us with a standard 
sparse-signal recovery problem for the vector x. We next show 
that this ansatz is guaranteed to recover the sparse vector 
X provided that condition ([T8| is satisfied. Let us detail the 
procedure. If the columns of B^ are linearly independent, the 
pseudo-inverse B^ exists, and the projector onto the orthogo- 
nal complement of TZ(B£) is given by 



(19) 



Applying Ilg to the measurement outcome z yields 

R£Z = R£(Ax + B£e£) ==R£Ax^z (20) 

where we used the fact that R^B^ = Oalu^- We are now left 
with the standard problem of recovering x from the modified 
measurement outcome z = R^Ax. What comes to mind first 
is that computing the standard recovery threshold Q for the 
modified dictionary RfA should provide us with a recovery 
threshold for the problem of extracting x from z = R^Ax. It 
turns out, however, that the columns of R^A will, in general, 
not have unit ^2-norm, an assumption underlying What 
comes to our rescue is that under condition ( fTSj ) we have (as 
shown in Theorem |5] below) ||R£a£||2 > for ^ = 1, . . . , Na- 
We can, therefore, normaUze the modified dictionary R^A by 
rewriting ( |20| i as 

z = RfAAx (21) 
where A is the diagonal matrix with elements 



1 



IRfa^ll 



1, 



and X = A^^x. Now, R^AA plays the role of the dictionary 
(with normalized columns) and x is the unknown sparse vector 
that we wish to recover. Obviously, supp(x) — supp(x) and 
X can be recovered from x according tq^ x = Ax. The 
following theorem shows that (1 8 1 is sufficient to guarantee 



the following: i) The columns of B^ are linearly independent, 
which guarantees the existence of B^, ii) HR^afHj > for 
£^l,...,Na, and iii) no vector x' e C^" with ||x'||o < 2na: 
lies in the kernel of R^A. Hence, ([Tsj enables perfect recov- 
ery of X from (|2T i. 

Theorem 5: If (T8]l is satisfied, the unique solution of (PO) 
applied to z = R^AAx is given by x. Furthermore, BP and 
OMP applied to z = R^AAx are guaranteed to recover the 
unique (PO)-solution. 

Proof: See Appendix [C] ■ 

Since condition ( 18 1 ensures that [A]^,^ > 0, ^ — 1, . . . , Na, 
the vector x can be obtained from x according to x = Ax. 



Furthermore, (18i guarantees the existence of Bl and hence 



the nonzero entries of e can be obtained from x as follows: 



bUz-Ax) 



^If j|R£aj>||2 > for . 
a one-to-one mapping. 



1 , . . . , Na , then the matrix A corresponds to 
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Theorem |5] generalizes the results in p6] Thms. 5 and 9] 
obtained for the special case A = Fa/ and B = 1^/ to pairs of 
general dictionaries and additionally shows that OMP delivers 
the correct solution provided that 



18 1 is satisfied. 



It follows from ( [2T| that other sparse-signal recovery algo- 
rithms, such as iterative thresholding-based algorithms |40|, 
CoSaMP pT[ , or subspace pursuit | |42) can be applied to 
recover xfj Finally, we note that the idea of projecting the 
measurement outcome onto the orthogonal complement of the 
space spanned by the active columns of B and investigating 
the effect on the RICs, instead of the coherence parameter 
(as was done in Appendix |C-C| l was put forward in p7), 
[ |43) along with RIC-based recovery guarantees that apply to 
random matrices A and guarantee the recovery of x with high 
probability (with respect to A and irrespective of the locations 
of the sparse corruptions). 

2) Recovery when X is known and £ is unknown: A pos- 
sible application scenario for this situation is the recovery of 
spectrally sparse signals with known spectral support that are 
impaired by impulse noise with unknown impulse locations. 

It is evident that this setup is formally equivalent to that 
discussed in Section |IV-B 1 1 with the roles of x and e inter- 
changed. In particular, we may apply the projection matrix 
R;f = 1m — Ax-^^x '■^ corrupted measurement outcome 
z to obtain the standard recovery problem z' = R;tBA'e, 
where A' is a diagonal matrix with diagonal elements [A']^.^ = 
l/||R;tbf||2- The corresponding unknown vector is given by 
e = (A')^^ e. The following corollary is a direct consequence 
of Theorem |5] 

Corollary 6: Let z = Ax + Be where X = supp(x) is 
known. If the number of nonzero entries in x and e, i.e., — 

|lx|L and rie — ||e|L, satisfy 



2n^ne < f{nx, 2ne) 



(22) 



then the unique solution of (PO) applied to z' = R;tBA'e is 
given by e = (A')^^e. Furthermore, BP and OMP applied 
to z' = R;i'BA'e recover the unique (PO)-solution. 

Once we have e, the vector e can be obtained easily, since 
e — A'e and the nonzero entries of x are given by 

XA- = At^(z - Be). 

Since ( [22) l ensures that the columns of A;^^ are linearly inde- 
pendent, the pseudo-inverse A\, is guaranteed to exist. Note 



that tightness of the recovery condition ( 22 1 can be established 
analogously to the case of £ known and X unknown (discussed 
in Section |IV-Bl| l. 

C. Case III: Cardinality of £ or X known 

We next consider the case where neither X nor £ are known, 
but knowledge of either ||x||p or ||e||p is available (prior to re- 
covery). An application scenario for ||x||q unknown and ||e||p 
known would be the recovery of a sparse pulse-stream with un- 
known pulse-locations from measurements that are corrupted 
by electric hum with unknown base-frequency but known num- 
ber of harmonics (e.g., determined by the base frequency of 

"^Finding analytical recovery guarantees for these algorithms remains an 
interesting open problem. 



the hum and the acquisition bandwidth of the system under 
consideration). We state our main result for the case rig = ||e||Q 
known and rij. — ||x||q unknown. The case where is known 
and He is unknown can be treated similarly. 

Theorem 7: Let z = Ax+Be, define ~ ll^llo = 
||e|L, and assume that rie is known. Consider the problem 



(PO, 



mmimize 

subject to Ax e ( {z} + IJ n{B£>) 



(23) 



where ^ — pn^({l, . . . ,Ni,}) denotes the set of subsets of 
{1, . . . , Nb} of cardinality less than or equal to rig. The unique 
solution of (PO, Tie) applied to z = Ax + Be is given by x if 



(24) 



Proof: See Appendix [P] ■ 
We emphasize that the problem (PO, Ue) exhibits prohibitive 
(concretely, combinatorial) computational complexity, in gen- 
eral. Unfortunately, replacing the ^o-norm of x in the mini- 
mization in ( |23| ) by the £i-norm does not lead to a compu- 
tationally tractable alternative either, as the constraint Ax e 
({^l + Uf'e^a 72.(B£')) specifies a non-convex set, in general. 
Nevertheless, the recovery threshold in ( |24] i is interesting as it 
completes the picture on the impact of knowledge about the 
support sets of x and e on the recovery thresholds. We refer 
to Section |V-A| for a detailed discussion of this matter Note, 
though, that greedy recovery algorithms, such as OMP p3) , 
p4[ , ||25J, CoSaMP |41|, or subspace pursuit |42|, can be 
modified to incorporate prior knowledge of the individual spar- 
sity levels of x and/or e. Analytical recovery guarantees cor- 
responding to the resulting modified algorithms do not seem 
to be available. 

We finally note that tightness of p4| can be established 
for A = Fm and B = 1^/. Specifically, consider the pair 



, e' = 



jj and the alternative pair x' = ^2^/11^ 
S./T7- It can be shown that both x 



"VA/' ~ "2VM ' "VM 

and x' are in the admissible set of (PO, rie) in ( |23] l, satisfy 
Ijx'llg — ||x||p, and lead to the same measurement outcome z. 
Therefore, (PO, rie) cannot distinguish between x and x' (we 
refer to f34l for details). 



D. Case IV: No knowledge about the support sets 

Finally, we consider the case of no knowledge (prior to 
recovery) about the support sets X and £. A corresponding 
application scenario would be the restoration of an audio signal 
(whose spectrum is sparse with unknown support set) that is 
corrupted by impulse noise, e.g., click or pop noise occur- 
ring at unknown locations. Another typical application can be 
found in the realm of signal separation; e.g., the decompo- 
sition of images into two distinct features, i.e., into a part 
that exhibits a sparse representation in the dictionary A and 
another part that exhibits a sparse representation in B. Decom- 
position of the image z then amounts to performing sparse- 
signal recovery based on z = Ax + Be with no knowledge 
about the support sets X and £ available prior to recovery. 
The individual image features are given by Ax and Be. 
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Recovery guarantees for this case follow from the results 
in p3) . Specifically, by rewriting ([T]) as z = Dw as in (|6|, we 
can employ the recovery guarantees in |33| |, which are explicit 
in the coherence parameters and fib, and the dictionary 
coherence Hd of D. For the sake of completeness, we restate 
the following result from [33 J . 

Theorem 8 f/ [55] Thm. 2]): Let z = Dw with w = 
[x^ e^]^ and D = [A B] with the coherence parameters 
fia < fJ-b and the dictionary coherence Hd as defined in (|8]l. A 
sufficient condition for the vector w to be the unique solution 
of (PO) applied to z = Dw is 

fix) + X 



where 



(1 + /ia)(l + ^J-b) - Xflbil + y.a) 



il + ^lb)/{^ib + ^^l) 



and X — mm{xb, Xg}- Furthermore, Xb 
and 

if Ha= ^J■b = iJ-d, 



Aid + ^^■b) - Ma - MaMb 



Otherwise. 



- MaAifc 

Obviously, once the vector w has been recovered, we can 
extract x and e. The following theorem, originally stated in 
[[33 1, guarantees that BP and OMP deliver the unique solution 
of (PO) applied to z = Dw and the associated recovery 
threshold, as shown in [33 1, is only slightly more restrictive 
than that for (PO) in ( |25] l. 

Theorem 9 f/ [55] Cor 4]): A sufficient condition for BP 
and OMP to deliver the unique solution of (PO) applied to 
z = Dw is given by 



if /ife < fid and K{fid, fJ-b) > 1, 



n,„ < 



otherwise 



(26) 



with n,„ — Tlx + rie and 

5yj2nd iHb + 3/1^ + e) - 2fid - 2^h((5 + /i^) 



fi{fid,fJ^b) = 



where S = 1 + fib and e = 2\/2^ fid^l^b + A'rf)- 

We emphasize that both thresholds ( [25] l and ( [26| are more 
restrictive than those in ([T5|, ([T8|, ( p2] i, and ( p4j l (see also Sec- 
tion |^A]), which is consistent with the intuition that additional 
knowledge about the support sets X and E should lead to 



higher recovery thresholds. Note that tightness of (25 1 and 



26)l was established before in |44| and |33|, respectively. 



V. Discussion of the Recovery Guarantees 

The aim of this section is to provide an interpretation of 
the recovery guarantees found in Section |IV] Specifically, we 
discuss the impact of support-set knowledge on the recovery 
thresholds we found, and we point out limitations of our re- 
sults. 



A. Factor of two in the recovery thresholds 

Comparing the recovery thresholds ( [T5| ), ( fTS] !, ( p2] l, and (|24]| 
(Cases I-III), we observe that the price to be paid for not 
knowing the support set A" or £" is a reduction of the recovery 
threshold by a factor of two (note that in Case III, both X and 
E are unknown, but the cardinality of either A" or £ is known). 
For example, consider the recovery thresholds ( [T5] l and ( [T8] l. 
For given n^. E [0, 1 + l/ptb], solving ( [T5| ) for yields 

(1 + fia){l - HbjUe - 1)) 

- I /fib] and solving ([T8]l 



(25) for n 



rieifil, - flafib) - 

Similarly, still assuming ng G [0,1 
we get 

1 (l + fia){l - flbine - 1)) 



2 rieifil^ - fiafJ-b) + Mq(1 + fJ-b) ' 



Hence, knowledge of X prior to recovery allows for the re- 
covery of a signal with twice as many nonzero entries in x 
compared to the case where X is not known. This factor-of- 
two penalty has the same roots as the well-known factor-of- 
two penalty in spectrum-blind samphng ||45]-||47|. Note that 
the same factor-of-two penalty can be inferred from the RIC- 
based recovery guarantees in [15 J , [39J , when comparing the 
recovery threshold specified in p9{ Thm. 1] for signals where 
partial support-set knowledge is available (prior to recovery) 



to that given in [15 Thm. 1.1] which does not assume prior 
support-set knowledge. 

We illustrate the factor-of-two penalty in Figs. [T] and |2] 
where the recovery thresholds ([15), ([T8|, (|22|), ([24]i, and ( [26| ) 
are shown. In Fig. [T] we consider the case /i^ ^ f.ib = 
and firri — 1/a/64- We can see that for X and £ known the 
threshold evaluates to fixTie < 64. When only X or £ is known 
we have rixiie < 32, and finally in the case where only Ue is 
known we get n^ng < 16. Note furthermore that in Case IV, 
where no knowledge about the support sets is available, the 
recovery threshold is more restrictive than in the case where 
rie is known. 

In Fig. [2] we show the recovery thresholds for fia — 0.1258, 
fib — 0.1319, and /i^ = 0.1321. We see that all threshold 
curves are straight lines. This behavior can be explained by 
noting that (in contrast to the assumptions underlying Fig. [T]) 
the dictionaries A and B have fia,fib > and the corre- 
sponding recovery thresholds are essentially dominated by the 
numerator of the RHS expressions in ( [TSj l, ( [T8[ ), ( p2| ), and 
( [24| ), which depends on both n^. and n^. More concretely, if 
fJ-a = fJ-b — fJ'm ~ fJ-d > 0, then the recovery threshold for 
Case n (where the support set £ is known) becomes 



which reflects the behavior observed in Fig. [2| 
B. The square-root bottleneck 



(27) 



The recovery thresholds presented in Section IV hold for all 



signal and noise realizations x and e and for all dictionary 
pairs (with given coherence parameters). However, as is well- 
known in the sparse-signal recovery literature, coherence- 
based recovery guarantees are — in contrast to RIC -based re- 
covery guarantees — fundamentally limited by the so-called 
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6 8 10 12 14 

signal sparsity n^. 

Fig. L Recovery thresholds (15}, (18), |22), (24), and (26) for = fJ-b = 0, 
and /^m = l/v^64. Note that the curves for "only £ known" and "only X 
known" are on top of each other. 




3 4 5 6 7 
signal sparsity 

Fig. 2. Recovery thresholds (B), (Ts), (22), (24), and (26) for /Ja = 0.1258, 
Atb = 0.1319, and ^lm = 0.1321. 



square-root bottleneck f^Sl. More specifically, in the noiseless 
case (i.e., for e = OjVb), the threshol d (|4) i states that recovery 
can be guaranteed only for up to vMnonzero entries in x. 
Put differently, for a fixed number of nonzero entries rij. in 
X, i.e., for a fixed sparsity level, the number of measurements 
M required to recover x through (PO), BP, or OMP is on the 
order of rt^. 

As in the classical sparse-signal recovery literature, the 
square-root bottleneck can be broken by performing a proba- 
bilistic analysis [48 1. This line of work — albeit interesting — 
is outside the scope of the present paper and is further 
investigated in |(33), 149), 150). 



C. Trade-ojf between and 

We next illustrate a trade-off between the sparsity levels 



dictionary B consisting of B ONBs such that //q = = 
^irn ^ 1/VM, where A + B < M + 1 with M = p'', p 
prime, and k e N+. Now, let us assume that the error sparsity 
level scales according to Ue — OL\fM for some < a < 1. 
For the case where only E is known but X is unknown (Case 



II), we find from ( 18 i that any signal x with (order- wise) 
(1 — a)^/~M /2 non-zero entries (ignoring terms of order less 
than ^/~M) can be reconstructed. Hence, there is a trade-off 
between the sparsity levels of x and e (here quantified through 
the parameter a), and both sparsity levels scale with ^/M. 

VI. Numerical Results 

We first report simulation results and compare them to the 
corresponding analytical results in the paper We will find 
that even though the analytical thresholds are pessimistic in 
general, they do reflect the numerically observed recovery 
behavior correctly. In particular, we will see that the factor-of- 
two penalty discussed in Section |V-A| can also be observed in 
the numerical results. We then demonstrate, through a simple 
inpainting example, that perfect signal recovery in the presence 
of sparse errors is possible even if the corruptions are signifi- 
cant (in terms of the i?2-norm of the sparse noise vector e). In 
all numerical results, OMP is performed with a predetermined 
number of iterations |[T3|, p4[ , ||25|, i.e., for Case II and 
Case IV, we set the number of iterations to rix and rix + n^, 
respectively. To implement BP, we employ SPGLl l52), pS). 



A. Impact of support-set knowledge on recovery thresholds 

We first compare simulation results to the recovery thresh- 
olds ( [T5| ), ( fTSl l, ( |22] l, and ( [26| . For a given pair of dictionaries 
A and B we generate signal vectors x and error vectors e 
as follows: We first fix and n^, then the support sets 
of the nj;-sparse vector x and the rig-sparse vector e are 
chosen uniformly at random among all possible support sets of 
cardinality rix and n^, respectively. Once the support sets have 
been chosen, we generate the nonzero entries of x and e by 
drawing from i.i.d. zero mean, unit variance Gaussian random 
variables. For each pair of support-set cardinalities and n^, 
we perform 10000 Monte-Carlo trials and declare success of 
recovery whenever the recovered vector x satisfies 



I2 < 10-3||x||2. 



(28) 



of X and e. Following the procedure outlined in |12|, pT) , 
we construct a dictionary A consisting of A ONBs and a 



We plot the 50% success-rate contour, i.e., the border between 
the region of pairs {rix, rie) for which ( |28] l is satisfied in at 
least 50% of the trials and the region where ( |28| l is satisfied in 
less than 50% of the trials. The recovered vector x is obtained 
as follows: 

• Case I: When X and £ are both known, we perform 
recovery according to ( [T4] l. 

• Case II: When either only 8 or only X is known, we 
apply BP and OMP using the modified dictionary as 
detailed in Theorem |5] and Corollary |6) respectively. 

• Case IV: When neither X nor £ is known, we apply BP 
and OMP to the concatenated dictionary D = [ A B ] as 
described in Theorem |9] 
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Fig. 3. Impact of support-set knowledge on the 50% success-rate contour 
of OMP and BP for the Hadamard-identity pair. 




24 32 40 
signal sparsity 

Fig. 4. Impact of support-set knowledge on the 50% success-rate contour 
of OMP and BP performing recovery in pairs of approximate ETFs each of 
dimension 64 X 80. 



Note that for Case III, i.e., the case where the cardinality rig of 
the support set £ is known — as pointed out in Section IV-C — 



we only have uniqueness results but no analytical recovery 
guarantees, neither for BP nor for greedy recovery algorithms 
that make use of the separate knowledge of rix or (whereas, 
e.g., standard OMP makes use of knowledge of ri^. + n^, 
rather than knowledge of Ux and individually). This case 
is, therefore, not considered in the simulation results below. 

1 ) Recovery performance for the Hadamard-identity pair 
using BP and OMP: We take M = 64, let A be the Hadamard 
ONB 1 54 1 and set B = Im, which results in /ia = /if, = and 
IJ-m = l/\fM- Fig. [3] shows 50% success-rate contours, under 
different assumptions of support-set knowledge. For perfect 
knowledge of X and £, we observe that the 50% success-rate 
contour is at about n^. + rig w M, which is significantly better 
than the sufficient condition Uxrif. < M (guaranteeing perfect 



recovery) provided in ([T5]lp| When either only X or only £ is 
known, the recovery performance is essentially independent of 
whether A" or £ is known. This is also reflected by the analyt- 
ical thresholds ( fTS) and ( |22] i when evaluated for fia = pb = 
(see also Fig.[T]). Furthermore, OMP is seen to outperform BP. 
When neither X nor £ is known, OMP again outperforms BP. 

It is interesting to see that the factor-of-two penalty dis- 
cussed in Section |V-A| is reflected in Fig. |3] (for = Ue) 
between Cases I and II. Specifically, we can observe that for 
full support-set knowledge (Case I) the 50% success-rate is 
achieved at n^; = rig ~ 31. If either X or £ only is known 
(Case II), OMP achieves 50% success-rate at Ux — rie ~ 23, 
demonstrating a factor-of-two penalty since 31-31 ss 23 -23 -2. 
Note that the results from BP in Fig. [3] do not seem to reflect 
the factor-of-two penalty. For lack of an efficient recovery 
algorithm (making use of knowledge of n^) we do not show 
numerical results for Case III. 

2 ) Impact of Pa, fJ-b > 0- We take M — 64 and generate 
the dictionaries A and B as follows. Using the alternating 
projection method described in |56l, we generate an approxi- 
mate equiangular tight frame (ETF) for consisting of 160 
columns. We split this frame into two sets of 80 elements 
(columns) each and organize them in the matrices A and B 
such that the corresponding coherence parameters are given by 
Pa « 0.1258, pb ~ 0.1319, and /i„, « 0.1321. Fig. |4] shows 
the 50% success-rate contour under four different assumptions 
of support-set knowledge. In the case where either only X or 
only £ is known and in the case where X and £ are unknown, 
we use OMP and BP for recovery. It is interesting to note that 
the graphs for the cases where only X or only £ are known, are 
symmetric with respect to the Une Ux = n,e- This symmetry 
is also reflected in the analytical thresholds ( fTS) and ( p2] i (see 
also Fig. |2] and the discussion in Section [V-A| i. 

We finally note that in all cases considered above, the nu- 
merical results show that recovery is possible for significantly 
higher sparsity levels Ux and than indicated by the corre- 
sponding analytical thresholds ( [T5| ), ([TSj, p2] ), and ( [26] l (see 
also Figs. [T] and |2]). The underlying reasons are i) the de- 
terministic nature of the results, i.e., the recovery guarantees 
in ( [T5| ), ( fTSl l, ( |22] i, and (|26]l are valid for all dictionary pairs 
(with given coherence parameters) and all signal and noise 
realizations (with given sparsity level), and ii) we plot the 50% 
success-rate contour, whereas the analytical results guarantee 
perfect recovery in 100% of the cases. 

B. Inpainting example 

In transform coding one is typically interested in maximally 
sparse representations of a given signal to be encoded | [57) . In 
our setting, this would mean that the dictionary A should be 
chosen so that it leads to maximally sparse representations of 
a given family of signals. We next demonstrate, however, that 
in the presence of structured noise, the signal dictionary A 
should additionally be incoherent to the noise dictionary B. 

""For A = Fjvf and B = Iji/ it was proven in |55 | that a set of columns 
chosen randomly from both A and B is linearly independent (with high 
probability) given that the total number of chosen columns, i.e., + Tie 
here, does not exceed a constant proportion of M. 
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(a) Corrupted image (b) Recovery when X and S are (c) Recovery when only £ is known (d) Recovery for X and £ unknown 

(MSE = -11.2dB) known (MSE = -184.6 dB) (MSE = -113.3dB) (MSE = -13.0dB) 



Fig. 5. Recovery results using the DCT basis for the signal dictionary and the identity basis for the noise dictionary, for the cases where [(b)] X and £ are 
known, [(c)] only £ is known, and |(d)| no support-set knowledge is available. (Picture origin: ETH Zurich/Esther Ramseier). 




(a) Corrupted image (b) Recovery when X and £ are (c) Recovery when only £ is known (d) Recovery for X and £ unknown 

(MSE = -11.2dB) known (MSE = -27.1 dB) (MSE = -27.1 dB) (MSE = -12.0dB) 

Fig. 6. Recovery results for the signal dictionary given by the Haar wavelet basis and the noise dictionary given by the identity basis, for the cases where 
|(b)| both X and £ are known, [(c)[ only £ is known, and |(d)| no support-set knowledge is available. (Picture origin: ETH Zurich/Esther Ramseier). 



This extra requirement can lead to very different criteria for 
designing transform bases (frames). 

To illustrate this point, and to show that perfect recov- 
ery can be guaranteed even when the ^2-norm of the noise 
term Be is large, we consider the recovery of a sparsely 
corrupted 512x512-pixel grayscale image of the main building 
of ETH Zurich. The dictionary A is taken to be either the 
two-dimensional discrete cosine transform (DCT) or the Haar 
wavelet decomposed on three octaves | [58| . We first "sparsify" 
the image by retaining the 15% largest entries of the image's 
representation x in A. We then corrupt (by overwriting with 
text) 18.8% of the pixels in the sparsified image by setting 
them to the brightest grayscale value; this means that the errors 
are sparse in B = 1m and that the noise is structured but may 
have large ^2-nomi. Image recovery is performed according to 
( [T4) i if X and £ are known. (Note, however, that knowledge 
of X is usually not available in inpainting applications.) We 
use BP when only £ is known and when neither X nor £ are 
known. The recovery results are evaluated by computing the 
mean-square error (MSE) between the sparsified image and its 
recovered version. 

Figs. |5] and |6] show the corresponding results. As expected, 
the MSE increases as the amount of knowledge about the 
support sets decreases. More interestingly, we note that even 
though Haar wavelets often yield a smaller approximation 



error in classical transform coding compared to the DCT, here 
the wavelet transform performs worse than the DCT. This 
behavior is due to the fact that sparsity is not the only factor 
determining the performance of a transform coding basis (or 
frame) in the presence of structured noise. Rather the mutual 
coherence between the dictionary used to represent the sig- 
nal and that used to represent the structured noise becomes 
highly relevant. Specifically, in the example at hand, we have 
/i„i = 1/2 for the Haar-wavelet and the identity basis, and 
/i,„ w 0.004 for the DCT and the identity basis. The de- 
pendence of the analytical thresholds ( [T5] l, ( fTSj ), ( |22| ), (|24|, 
and ( 26 1 on the mutual coherence /im explains the performance 
difference between the Haar wavelet basis and the DCT basis. 
An intuitive explanation for this behavior is as follows: The 
Haar-wavelet basis contains only four non-zero entries in the 
columns associated to fine scales, which is reflected in the 
high mutual coherence (i.e., /i„i = 1/2) between the Haar- 
wavelet basis and the identity basis. Thus, when projecting 
onto the orthogonal complement of {1m)£, it is likely that 
all non-zero entries of such columns are deleted, resulting 
in columns of all zeros. Recovery of the corresponding non- 
zero entries of x is thus not possible. In summary, we see 
that the choice of the transform basis (frame) for a sparsely 
corrupted signal should not only aim at sparsifying the signal 
as much as possible but should also take into account the 
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mutual coherence between the transform basis (frame) and the 
noise sparsity basis (frame). 

VII. Conclusion 

The setup considered in this paper, in its generality, appears 
to be new and a number of interesting extensions are possible. 
In particular, developing (coherence-based) recovery guaran- 
tees for greedy algorithms such as CoSaMP pT) or subspace 
pursuit |42) for all cases studied in the paper are interesting 
open problems. Note that probabilistic recovery guarantees for 
the case where nothing is known about the signal and noise 
support sets (i.e.. Case IV) readily follow from the results 
in p3). Probabilistic recovery guarantees for the other cases 
studied in this paper are in preparation pO) . Furthermore, 
an extension of the results in this paper that accounts for 
measurement noise (in addition to sparse noise) and applies 
to approximately sparse signals can be found in 159). 
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Appendix A 
Proof of Theorem[3] 

We prove the full (column-)rank property of T)x.£ by show- 
ing that under ^T5\ there is a unique pair (x, e) with supp(x) = 
X and supp(e) = £ satisfying z = Ax + Be. Assume that 
there exists an alternative pair (x', e') such that z = Ax'+Be' 
with supp(x') C X and supp(e') C £ (i.e., the support sets of 
x' and e' are contained in X and £, respectively). This would 
then imply that 



and thus 



Ax + Be = Ax' + Be' 



A(x - x') = B(e' 



Since both x and x' have support in X it follows that x — 
x' also has support in X, which implies ||x — x'||g < n^. 
Similarly, we get ||e' — e||g < rig. Defining p = x — x' and 
V = supp(x — x') C X, and, similarly, q = e' — e and 
Q = supp(e' — e) C £, we obtain the following chain of 
inequalities: 



nxUe > 



> 



iipiioiiqiio = 

[1-Ma(|^| 



1^1 ISI 
[1 



Mfc(IQI-l)] 



> 



[1 - - 1)] [1 - ^b(ne - 1)]' 



(29) 

f{nx,ne) 
(30) 

where (|29| follows by applying the uncertainty relation in The- 
orem [T] (with = eg = since both p and q are perfectly 
concentrated to V and Q, respectively) and ( [SO] ) is a conse- 
quence of IT' I < fix and \Q\ < ng. Obviously, ([30| contradicts 
the assumption in ( [T5| ), which completes the proof. 



Appendix B 
Proof of Theorem|4] 

We begin by proving that x is the unique solution of (PO, £) 
applied to z = Ax + Be. Assume that there exists an alter- 
native vector x' that satisfies Ax' G ({z} + TZ{li£)) with 
||x'||q < Ux- This would imply the existence of a vector e' 
with supp(e') C £, such that 



Ax + Be = Ax' + Be' 



and hence 



A(x- 



B(e' 



Since supp(e) = £ and supp(e') C £, we have supp(e' — e) C 
£ and hence ||e' — ejl^ < rig. Furthermore, since both x and 
x' have at most Ux nonzero entries, we have ||x — x'||g < 2nx- 
Defining p = x — x' and V = supp(x — x'), and, similarly, 
q = e' — e and Q = supp(e' — e) C £, we obtain the following 
chain of inequalities 



2n,ng>||p||o||q||o-|7'||Q| 

[l-f^a{\V\-l)]+[l-M\Q\-l)]- 



> 



> 



[1 - ^lai2nx-l)]+ [1 - ^lb{n,-l)]- 



(31) 

f{2nx,ne) 
(32) 



where ( (3T| follows by applying the uncertainty relation in The- 
orem [T] (with e-p = eg =0 since both p and q are per- 
fectly concentrated to V and Q, respectively) and ( [32| is a 
consequence of {Vl < 2nx and |Q| < rig. Obviously, (|32]| 
contradicts the assumption in ( fTSj ), which concludes the first 
part of the proof. 

We next prove that x is also the unique solution of (BP, £) 
applied to z = Ax + Be. Assume that there exists an al- 
ternative vector x' that satisfies Ax' e ({z} +TZ{'B£)) with 
W^'Wi ^ W^Wi- This would imply the existence of a vector e' 
with supp(e') C £, such that 



and hence 



Ax + Be = Ax' + Be' 



A(x - x') = B(e' - e) 



Defining p = x — x', we obtain the following lower bound 
for the £i-norm of x' 

||x'||, = ||x-p||, = ||P;,(x-p)||, + ||P;,.p|l, 

> ||P;tx||i-||P;tP|li + ||P;t=p|li (33) 
= ||x||,-|lP;,p|!, + ||P^.p||, 

where ( |33| ) is a consequence of the reverse triangle inequality. 
Now, the ^i-norm of x' can be smaller than or equal to that of 
X only if ||P;fp||j^ > ||P;t<:p|li- This would then imply that 
the difference vector p needs to be at least 50%-concentrated 
to the set V = X (of cardinality Ux), i.e., we require that 
e-p < 0.5. Defining q = e' — e and Q = supp(e' — e), and 
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noting that supp(e) = £ and supp(e') C £, it follows that 
\Q\ < n,.- This leads to the following chain of inequalities: 

n^n, > \r\ \Q\ 

[(1 + Ma)(l - er) - \V\ [1 - /ib (|Q| - 1)] + 



> 



(34) 

> 1 [l-A^a(2n.-l)]+[l~MbK-l)]+ ^33^ 

where ( |34j i follows from the uncertainty relation in Theorem [T] 
applied to the difference vectors p and q (with e-p < 0.5 since 
p is at least 50%-concentrated to V and eg = since q is 
perfectly concentrated to Q) and ( |35] l is a consequence of 
IVI — Tlx and |Q| < rie. Rewriting ( |35| l, we obtain 



[l-Aia(2n,-l)]+[l-A*bK-l) 



= /(2na;,ne). 

(36) 

Since ([36| contradicts the assumption in ( fTSj l, this proves that 
X is the unique solution of (BP, £) applied to z = Ax + Be. 



where ([39| follows from the Rayleigh-Ritz theorem | [60^. Thm. 
4.2.2] and (HOl) results from 



E 



bf aJ~ < Ueiii 



Next, applying Gersgorin's disc theorem ||60j Theorem 6.1.1], 
we arrive at 



(41) 



Combining (38i, (40 1, and (41 1 leads to the following lower 
bound on ||R£af||2: 



IRfa^ll" > 1 



(42) 



Note that if condition ( [T8] l holds fo:^ rix > 1, it follows 
that Ue /i^ < [1 - Ubine - 1)]^ and hence the RHS of (|42]) 
is strictly positive. This ensures that A defines a one-to-one 
mapping. We next show that, moreover, condition ([TSj ensures 
that for every vector x' € C^" satisfying ||x'||q < 2nx, Ax' 
has a nonzero component that is orthogonal to TZ(B£). 



Appendix C 
Proof of Theorem|5] 

We first show that condition ( fTS] ) ensures that the columns 
of Bf: are linearly independent. Then, we establish that 



IRfa^lL > for i ^ 1, 



, Na ■ Finally, we show that the 



unique solution of (PO), BP, and OMP applied to z = Rf AAx 
is given by x = A^^x. 

A. The columns of B^ are linearly independent 

Condition ( fTS) can only be satisfied if [1 — /if,(rie — 1)]^ > 0, 
which implies that rie < 1 + l//^f,. It was shown in pT|-| 13 1 
that for a dictionary B with coherence /i^ no fewer than 1 + 
1 //ift columns of B can be linearly dependent. Hence, the rig 
columns of Bf must be linearly independent. 



B. |lR£a,|l2 >Qfor £^l,...,Na 

We have to verify that condition ([TSj implies HR^a^ > 
for i = l,...,Na- Since R^ is a projector and, therefore, 
Hermitian and idempotent, it follows that 



||Rf:a£||2 = af Rfa^ 
= laf R^a^l 



> 1 - 



afB. 



Bfa, 



(37) 



(38) 



where ( [37] i is a consequence of Rl'^R^ — R^, and ( |38l ) 
follows from the reverse triangle inequality and ||af||2 ~ 1, 
£ = 1 , . . . , Na ■ Next, we derive an upper bound on Ci accord- 
ing to 



Cl < An 



<A-.UBfB 



Bfa,| 



(39) 
(40) 



C. Unique recovery through (PO), BP, and OMP 

We now need to verify that (PO), BP, and OMP (applied 
to z = RfAAx) recover the vector x = A^^x provided 
that ( fTSj ) is satisfied. This will be accomplished by deriving an 
upper bound on the coherence /i(Rf AA) of the modified dic- 
tionary RfAA, which, via the well-known coherence-based 
recovery guarantee pTj-fTJj 

1/. ,^ \ (43^ 



< - l + ^d(R£AA) 



leads to a recovery threshold guaranteeing perfect recovery of 
X. This threshold is then shown to coincide with ( [T8] l. More 
specifically, the well-known sparsity threshold in (|4| guaran- 
tees that the unique solution of (PO) applied to z Rf AAx 
is given by x, and, furthermore, that this unique solution can 
be obtained through BP and OMP if ( [43] l holds. It is important 
to note that llxIL = llxIL = 



1 



|R£a,|| 



With 



we obtain 



/i(R£AA) = max 



r,Lt^r ||R£ar||2 HRfaf] 



(44) 



Next, we upper-bound the RHS of (44i by upper-bounding 
its numerator and lower-bounding its denominator. For the 
numerator we have 



|af Rf Rfa^l = |af Rfa^j 

afBfB^a^ 
afB£ (BfB£)-'Bfa, 



< |af aJ 



< IJ-a 



(45) 
(46) 

(47) 



= C2 



The case rij, = is not interesting, as n^, = corresponds to x = OjVo 
and hence recovery of x = Ojv„ only could be guaranteed. 
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where (HSll follows from R?R/- = R*-, (|46ll is ob 



tained through the triangle inequality, and ( |47j l follows from 
[a^'^a^l < A*a- Next, we derive an upper bound on C2 accord- 
ing to 



C2 < Bfa, 



(BfB,)-^Bfa, 



iBfaJ 



(48) 
(49) 



where ( |48| l follows from the Cauchy-Schwarz inequality 
and (|49]l from the Rayleigh-Ritz theorem |60 Thm. 4.2.2]. 
Defining i = arg max^||B|'^ar||2, we further have 



Co < 



A, 



;BfB,)- 

tiax(^(B|'^B£ 



|Bfa.||^ 



We obtain an upper bound on C2 using the same steps that 
were used to bound Ci in ([39| - ( |4T| l: 



C2 < 



(50) 



where Cf, = [1 — ^h(ne — 1)]^. Combining (47 1 and (50l 
leads to the following upper bound 



(51) 



Next, we derive a lower bound on the denominator on the 
HS oi{ 
note that 



RHS of ( |44] l. To this end, we set j — arg min,,||R£ar||2 and 



Appendix D 
Proof of TheoremIt] 

Assume that there exists an alternative vector x' that 
satisfies Ax' € ({z} + U£g5^^(B£)) (with ^ = 
p„_, ({1, . . . , A*";,})) with ||x'||q < Tlx- This implies the exis- 
tence of a vector e' with ||e'||Q < Ug such that 



Ax + Be = Ax' + Be' 



and therefore 



A(x-x') = B(e' -e). 

From ||x||q = and ||x'||q < Ux it follows that ||x — x'||q < 
2nx- Similarly, ||e||Q ~ and ||e'||Q < imply ||e' — ejlg < 
2ne. Defining p = x— x' and V = supp(x— x'), and, similarly, 
q = e' — e and Q = supp(e' — e), we arrive at 



4n,ne> ||p||o||q||o = 1^1121 

[^-^^a{\v\-l)]+[l-^^,{\Q\-l)]+ 



> 



(56) 



> [^-^°(^"--^)]"i^-^^(^"--^)]'^/(2..,2n.) 

Mm 

(57) 

where ( |56| ) follows from the uncertainty relation in Theorem [T| 
applied to the difference vectors p and q (with = eg = 
since both p and q are perfectly concentrated to V and Q, 
respectively) and ( |57] i is a consequence of jPl < 2nx and 
\Q\ < 2ne. Obviously, ( |57] i is in contradiction to ( |24] l, which 
concludes the proof. 



IR^a^lU llRfa.lU > HR^a, 



> 1 



■3112 



(52) 



where ( |52] i follows from ( |42j i. Finally, combining ( |5T| ) and (|52]) 
we arrive at 



Ai(R£AA)< ^ ^ 2 



(53) 



Inserting ( |53] l into the recovery threshold in ( |43| ), we obtain 
the following threshold guaranteeing recovery of x from z = 
RfAAx through (PO), BP and OMP: 



Tlx < 



1 / Ct.{l+^^a) 



(54) 



2 \fJ.aCb +nefil^ 

Since 2nxnf, n^-^ > 0, we can transform ( |54] l into 

2nx"e Mm < C'6[l - A*a(2nj; - 1)]^ 

= [1 - MbK - 1)]+ [1 - Ma(2n, - 1)]+ . (55) 



Rearranging terms in ( [55| l finally yields 

2nxne < J{2nx,ne) 

which proves that ([TSj guarantees recovery of the vector x 
(and thus also of x = Ax) through (PO), BR and OMR 
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